Storytelling with Rmarkdown

Data Analysis and Bioinformatics

2024-02-12

Learning outcomes

  • Why use (R)markdown?
  • Knitting Rmarkdown to html, pdf and word
  • Configuring knitting and code options

What is Markdown?

Markup

<html>
<body>
<h1>This is an HTML page</h1>
Hello world!
<br>
<img src="https://rmarkdown.rstudio.com/docs/reference/figures/logo.png">
</img>
</body>
</html>

What is Markdown?

Markdown

# This is an HTML page

Hello world!

![](https://rmarkdown.rstudio.com/docs/reference/figures/logo.png)

What is Markdown?

RMarkdown

# This is an HTML page

Hello world!

![](https://rmarkdown.rstudio.com/docs/reference/figures/logo.png)

```{r}
summary(cars)
plot(cars)
```

Rmarkdown integrates prose + code

Why Rmarkdown

R analysis .r

data <- read.csv("processed_dataset.csv")
results <- analyze(data)
plot(results)

analysed_data <-
    read_analysed_data("analysed_data.txt")
plot(analysed_data)

Bash preprocessing .sh

sed 's/pattern/replacement/' dataset.csv > processed_dataset.csv

Analyze some data with python

data = read_data("input_data.csv")
analysis = analyze_with_python_package(data)

Why Rmarkdown

.rmd

```{bash}
sed 's/pattern/replacement/' dataset.csv > processed_dataset.csv
```

```{r}
data <- read.csv("processed_dataset.csv")
results <- analyze(data)
plot(results)
```

```python
data = read_data("input_data.csv")
analysis = analyze_with_python_package(data)
```

```{r}
plot(py$analysis)
```

Why Rmarkdown

.rmd

Here's why I needed to preprocess.

```{bash}
sed 's/pattern/replacement/' dataset.csv > processed_dataset.csv
```

Here's how I used the data

```{r}
data <- read.csv("processed_dataset.csv")
results <- analyze(data)
plot(results)
```

Here's results from a python package

```{python}
data = read_data("input_data.csv")
analysis = analyze_with_python_package(data)
```

Final plots

```{r}
plot(py$analysis)
```

Conclusions...

RMarkdown lets you describe both what you did by programming in different languages and why you did it in the same document.

Why Rmarkdown

Version control

git log -p ./README.md
commit 41645e88a78cc41f43c65a04931fc5ec2b34dacb
Author: James Eapen <james.eapen@vai.org>
Date:   Tue Feb 6 11:23:57 2024 -0500

    fix rmarkdown guide preface link

diff --git a/session_5_rmarkdown/README.md b/session_5_rmarkdown/README.md
index 0f1ed51..4e5bf4f 100644
--- a/session_5_rmarkdown/README.md
+++ b/session_5_rmarkdown/README.md
@@ -36,7 +36,7 @@
       have to remember everything from this - treat it as a reference document
       during class and for your homework.
 
-    - [Preface to R Markdown: The Definitive Guide](https://bookdown.org/yihui/rmarkdown/installation.html)
+    - [Preface to R Markdown: The Definitive Guide](https://bookdown.org/yihui/rmarkdown)
 
 ## Additional Resources
 

commit e7bd2731f703980f9104ecc342c9efc4c40d3364
Author: James Eapen <james.eapen@vai.org>
Date:   Mon Feb 5 15:21:56 2024 -0500

    add session5_rmarkdown pre-class and slides

diff --git a/session_5_rmarkdown/README.md b/session_5_rmarkdown/README.md
new file mode 100644
index 0000000..0f1ed51
--- /dev/null
+++ b/session_5_rmarkdown/README.md
@@ -0,0 +1,92 @@
+# R Markdown
+
+## Learning Objectives
+
+1. Work with Rmarkdown files in RStudio for generating reports and knit them to
+   PDF, HTML, and Word documents.
+
+1. Include R and bash code chunks to run statistical analyses and generate plots
+
+1. Configure how code chunks and plots are knit to the final output document.
+
+## Pre-class assignment
+
+1. Make sure you have the `rmarkdown`, `knitr`, and `tinytex` packages installed
+   by running the following code in the RStudio console. If the result is not
+   TRUE for each, install the missing one with `install.packages("[package
+   name]")`.
+
+    ```r
+    > c('rmarkdown', 'knitr', 'tinytex') %in% installed.packages()
+    # [1] TRUE TRUE TRUE
+    ```
+
+1. Watch these short videos:
+
+    - [RMarkdown](https://vimeo.com/178485416) 
+
+    - [A reproducible workflow](https://www.youtube.com/watch?v=s3JldKoA0zw): a
+      little dramatic, but gets the point across
+
+2. Read the following:
+
+    - [Getting started](https://www.markdownguide.org/getting-started/)
+
+    - [Basic syntax](https://www.markdownguide.org/basic-syntax/): You don't
+      have to remember everything from this - treat it as a reference document
+      during class and for your homework.
+
+    - [Preface to R Markdown: The Definitive Guide](https://bookdown.org/yihui/rmarkdown/installation.html)
+
+## Additional Resources
+
+- [Markdown guide written by its creator](https://daringfireball.net/projects/markdown/basics)
+
+- [R Markdown: The Definitive Guide](https://bookdown.org/yihui/rmarkdown/): a
+  practical guide to all that you can do in R Markdown written by a statistician
+  who co-authored the [rmarkdown package](https://github.com/rstudio/rmarkdown).
+
+- [Common Problems with rmarkdown (and some
+  solutions)](https://rmd4sci.njtierney.com/common-problems-with-rmarkdown-and-some-solutions.html)
+
+- [Rmarkdown cheatsheet](https://raw.githubusercontent.com/rstudio/cheatsheets/main/rmarkdown-2.0.pdf)
+
+### What can you do with (R)Markdown
+
+#### Reports
+
+- 2021 VAIGS Biostatistics project: <https://jamespeapen.github.io/expdesign/p1final.html>
+
+- <https://svmiller.com/blog/2023/01/what-log-variables-do-for-your-ols-model>
+
+#### PhD theses
+
+These use the [oxforddown](https://github.com/ulyngs/oxforddown) package which
+is based on [bookdown](https://bookdown.org/).
+
+- <https://thesis.shirdekel.com/thesis.pdf>
+
+- <https://ulyngs.github.io/phd-thesis/_main.pdf>
+
+- <https://gsarti.com/thesis/Sarti_2020_Interpreting_NLMs_for_LCA.pdf>
+
+#### Presentations
+
+- <https://meghan.rbind.io/slides/neair/neair.html#/title-slide>
+
+- [RLadies presentations](https://github.com/rladies/rladies_global_presentations)
+
+#### Websites
+
+- <https://rmarkdown.rstudio.com/>
+
+- <https://yihui.org/en>
+
+- <https://svmiller.com>
+
+- <http://www.jenniferbradham.org>
+
+#### CV
+
+- https://svmiller.com/blog/2016/03/svm-r-markdown-cv/
+

Run git log -p [word document]

Why Rmarkdown

Version control

git log -p ../session_2_Git_and_Github/Jan22_quiz.docx
commit 31d907b06fc8bc89d7148512691c5a56d290b732
Author: Ian Beddows <ianbeddows@c02xg0hvjgh6.vai.org>
Date:   Mon Feb 5 11:43:28 2024 -0500

    added Jan22_quiz

diff --git a/session_2_Git_and_Github/Jan22_quiz.docx b/session_2_Git_and_Github/Jan22_quiz.docx
new file mode 100644
index 0000000..4f824ca
Binary files /dev/null and b/session_2_Git_and_Github/Jan22_quiz.docx differ

Since markdown is plaintext you can see the version history unlike a Word document which not plaintext

Why Rmarkdown

Write a story

  • code (R, bash, python, …)

  • plots

  • styling

  • version controlled

  • with, multiple output formats

  • citations and bibliography

Use markdown guide as a reference

Knit your own

Questions about markdown?

session5_rmarkdown/examples/example.rmd

How does it work?

YAML: YAML ain’t markup language

config_key1: value1
config_key2: value2
config_key3: value3
config_key1: value1
config_key2:
  value2_as_config_key: value2_1
config_key3: value_3

RMarkdown output is configured using yaml keys and values

How does it work?

Output options


---
title: "Document title"
subtitle: "Add subtitle"
date: "Feb 12, 2024"
---

How does it work?

Output options


---
title: "Document title"
subtitle: "Add subtitle"
date: "Feb 12, 2024"
output: html_document       # default html settings
---

How does it work?

Output options


---
title: "Document title"
subtitle: "Add subtitle"
date: "Feb 12, 2024"
output:
  html_document:            # specify html settings
    toc: true               # Add table of contents
---

How does it work?

Output options


---
title: "Document title"
subtitle: "Add subtitle"
date: "Feb 12, 2024"
output:
  html_document:
    toc: true
  pdf_document:             # multiple output formats
    toc: true
  word_document:
    reference_docx: reference.docx
---

Refer to Rmarkdown guide for reference.docx setup

How does it work?

Bibliography


---
title: "Document title"
subtitle: "Add subtitle"
date: "Feb 12, 2024"
output:
  html_document:
    toc: true
  pdf_document:             # multiple output formats
    toc: true
  word_document:
    toc: true
bibliography: references.bib        # bibliography file with citation info
csl: asm.csl                        # citation style file
---

Knitting options

HTML

---
output:
  html_document:
    toc: true
    toc_depth: 2        # two levels of headings in the table of contents
    theme: yeti         # set the documents visual theme
    code_folding: hide  # hide code inside a dropdown
---

Knitting options

PDF

---
output:
  pdf_document:
    toc: true
---

Knitting options

MS Word

---
output: word_document
---

Knitting options

MS Word

---
output:
  word_document:
    toc: true
    reference_docx: reference.docx  # gets the font and style info from this
---

Knitting options

YAML config

Helper function:

install.packages("ymlthis")

https://ymlthis.r-lib.org/

Themes

rmarkdown:::themes()

    https://bootswatch.com

    Code chunks

    ```{r chunk_name, eval=TRUE, echo=TRUE, include=TRUE, message=TRUE, warning=TRUE}
    ```

    Code chunks

    Chunk names

    ```{r}
    cat(does_not_exist)
    ```
    Error in eval(expr, envir, enclos): object 'does_not_exist' not found
    processing file: rmarkdown.qmd
      |....                                     |   8% (unnamed-chunk-2)
      |......                                   |  12% (unnamed-chunk-3)
      |.....................                    |  41% (unnamed_chunk-4)     Quitting from lines 271-272 (rmarkdown.qmd) 
    Error in cat(does_not_exist) : object 'does_not_exist' not found

    Without a label it can be hard to figure out where the chunk with the error is

    Code chunks

    Chunk names

    ```{r this_named_chunk}
    cat(does_not_exist)
    ```
    Error in eval(expr, envir, enclos): object 'does_not_exist' not found
    processing file: rmarkdown.qmd
      |....                                     |   8% (unnamed-chunk-2)
      |......                                   |  12% (unnamed-chunk-3)
      |.....................                    |  41% (this_named_chunk)     Quitting from lines 271-272 (rmarkdown.qmd) 
    Error in cat(does_not_exist) : object 'does_not_exist' not found

    The label identifies the chunk with the error: this_named_chunk

    Code chunks

    What’s written:

    ```{r}
    message("This is a test message")
    warning("This is a test warning")
    colnames(cars)
    ```

    Output:

    message("This is a test message")
    This is a test message
    warning("This is a test warning")
    Warning: This is a test warning
    colnames(cars)
    [1] "speed" "dist" 

    Code chunks

    echo: show code?

    What’s written:

    ```{r, echo=FALSE}
    message("This is a test message")
    warning("This is a test warning")
    colnames(cars)
    ```

    Output:

    This is a test message
    Warning: This is a test warning
    [1] "speed" "dist" 

    Code chunks

    eval: run code?

    What’s written:

    ```{r, eval=FALSE}
    message("This is a test message")
    warning("This is a test warning")
    colnames(cars)
    ```

    Output:

    message("This is a test message")
    warning("This is a test warning")
    colnames(cars)

    Code chunks

    include: do anything with code and output?

    What’s written:

    ```{r, include=FALSE}
    message("This is a test message")
    warning("This is a test warning")
    colnames(cars)
    ```
    ```{r, echo=FALSE, eval=FALSE}
    message("This is a test message")
    warning("This is a test warning")
    colnames(cars)
    ```

    Output:

    Code chunks

    message: show messages?

    What’s written:

    ```{r, message=FALSE}
    message("This is a test message")
    warning("This is a test warning")
    colnames(cars)
    ```

    Output:

    message("This is a test message")
    warning("This is a test warning")
    Warning: This is a test warning
    colnames(cars)
    [1] "speed" "dist" 

    Code chunks

    warning: show warnings?

    What’s written:

    ```{r, warning=FALSE}
    message("This is a test message")
    warning("This is a test warning")
    colnames(cars)
    ```

    Output:

    message("This is a test message")
    This is a test message
    warning("This is a test warning")
    colnames(cars)
    [1] "speed" "dist" 

    Be careful when ignoring warnings - only for final draft

    Inline-code

    Number of rows in cars dataset =

    nrow(cars)
    [1] 50
    Number of rows in `cars` dataset = `r nrow(cars)`

    Number of rows in cars dataset = 50

    Inline-code

    _ genes from _ patients

    dim(rna_count_matrix)
    [1] 3000  24
    `r dim(rna_count_matrix)[1]` genes from `r dim(rna_count_matrix)[2]` patients

    3000 genes from 24 patients

    Inline-code

    One plus One = `r 1 + 1`

    One plus One = 2


    The cars have a mean speed of `r mean(cars$speed)` mph.

    The cars have a mean speed of 15.4 mph.

    date: `r Sys.Date()`

    2024-02-26

    ---
    title: "Document title"
    subtitle: "Add subtitle"
    date: `r Sys.Date()`
    output: html_document
    ---


    sessionInfo

    sessionInfo()
    R version 4.3.2 (2023-10-31)
    Platform: x86_64-pc-linux-gnu (64-bit)
    Running under: Ubuntu 22.04.4 LTS
    
    Matrix products: default
    BLAS/LAPACK: /nix/store/yph2asi2jsbab21sqckalj6kgvd94jgf-blas-3/lib/libblas.so.3;  LAPACK version 3.12.0
    
    locale:
     [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
     [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
     [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
     [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
     [9] LC_ADDRESS=C               LC_TELEPHONE=C            
    [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
    
    time zone: America/Detroit
    tzcode source: system (glibc)
    
    attached base packages:
    [1] stats     graphics  grDevices utils     datasets  methods   base     
    
    loaded via a namespace (and not attached):
     [1] compiler_4.3.2  fastmap_1.1.1   cli_3.6.2       tools_4.3.2    
     [5] htmltools_0.5.7 yaml_2.3.8      rmarkdown_2.25  knitr_1.45     
     [9] jsonlite_1.8.8  xfun_0.41       digest_0.6.33   rlang_1.1.2    
    [13] evaluate_0.23  

    Practice

    • Use example.rmd to make a report using palmerpenguins

    • plot the body weight against sex and add a caption

      • hint: its a chunk option
    • Test whether there is difference in mean body weight between male and female penguins

    • Write a little conclusion of the analysis

      Using inline-code:

      • report means for the two groups
      • report the p-value
    • Add sessionInfo() at the end

    • Knit to HTML, PDF and Word

    install.packages('palmerpenguins')

    Homework: run another analysis between two variables in the dataset and report

    Upload final Rmd to github - with your name in the filename -

    under session5_rmarkdown/homework

    Tour of RStudio